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1.  INTRODUCTION 


In  combining  bodies  of  information,  if  some  of  the  data  is  common  to  both  sets,  then  the  sets 
are  correlated,  and  the  potential  amount  of  information  is  diminished  as  a  result  of  the  correlation. 
For  example,  when  data  is  expensive  it  is  common  for  one  set  of  data  to  serve  as  the  basis  for 
many  studies.  Combining  the  results  of  these  studies  as  if  they  were  independent  could  lead  to 
ill-founded  confidence  intervals  for  the  final  estimator.  In  some  situations  the  measurements  may 
be  correlated.  Pollution  measurements  of  a  body  of  water  will  be  correlated  across  both  time  and 
location.  Taking  many  water  samples  (as  opposed  to  one)  at  a  given  location  and  time  does  not 
necessarily  increase  the  information. 

In  many  instances,  there  is  a  need  to  extract  information  from  data  that  is  self-correlated.  In 
some  situations  the  problem  of  correlation  is  solved  by  sampling  at  distances  over  which  the 
correlation  is  considered  negligible.  This  is  an  option  when  the  size  of  the  sampling  window  can 
be  controlled  by  the  designer.  In  other  situations,  there  is  a  tradeoff  between  the  size  of  the 
sampling  window  and  cost;  thus,  it  is  useful  to  have  a  method  available  to  gain  insight  into  these 
tradeoffs.  It  is  the  purpose  of  this  report  to  provide  some  insight  and  clarify  some  of  the  issues 
associated  with  this  problem. 

2.  BACKGROUND 

In  signal  processing  the  amount  of  information  that  can  be  extracted  from  a  signal  is  a 
function  of  the  variance,  the  number  of  samples  taken,  the  correlation  of  the  signal,  and  the 
observation  interval.  When  the  correlation  time  of  the  signal  is  longer  than  the  sampling  window, 
it  is  possible  for  an  estimate  to  contain  a  large  bias.  This  situation  can  arise  when  an  incoming 
projectile  is  detected  a  short  distance  from  its  intended  target  and  the  autocorrelation  time  of  the 
measurement  noise  is  longer  than  the  time  remaining  until  impact.  For  certain  noise  functions 
we  would  like  to  know  how  much  information  can  be  extracted  from  the  signal,  what  is  a  good 
sampling  rate,  and  how  much  gain  is  there  in  extending  the  detection  distance  (or  increasing  the 
observation  window).  Usually  the  dimension  of  correlation  is  time;  however,  it  can  be  proximity 
along  any  dimension.  If  the  correlation  is  considered  a  nuisance,  an  adequate  model  for  the 
expectation  of  two  observations  is  an  exponential  correlation  model  (Young  and  Jakeman  1979; 
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Seber  and  Wild  1 989).  If  v  is  the  variable  of  correlation,  then  p  is  the  parameter  in  the  correlation 
function.  The  equation, 


Cor(v„vy)  -e-”**  .  (1) 

gives  the  correlation  between  observations  separated  by  d(y^)  where  d  is  the  appropriate 
distance  function.  Typically,  variables  are  correlated  across  time  or  physical  location. 

When  it  is  not  feasible  to  partition  the  sample  space  into  groups  or  clusters  that  are  not  highly 
correlated,  the  effects  of  correlation  must  be  addressed.  An  approach  to  this  issue  is  to  consider 
the  tradeoffs  between  the  cost  of  an  observation  and  the  gain  of  information  due  to  the 
observation.  The  gain  of  information  is  indicated  by  the  reduction  of  the  covariance.  Thus,  if  an 
observation  leads  to  a  significant  reduction  of  the  covariance  of  the  estimate  then  it  is  cost 
effective.  One  approach  to  this  question  is  to  find  the  reduction  in  variance  for  different  sampling 
methods  and  then  look  at  the  performance/cost  questions.  The  problem  of  estimating  the  mean 
of  a  set  of  correlated  data  is  the  problem  of  consideration. 

3.  THE  MEAN  AS  AN  ESTIMATOR 

Consider  the  problem  of  estimating  the  mean  over  a  fixed  time  interval  when  the 
measurement  noise  is  correlated  across  time.  The  goal  is  to  quantify  the  amount  of  information 
extracted  for  different  sampling  rates.  As  the  sampling  rate  increases,  the  correlation  between 
the  obSv  rvations  will  increase.  The  discussion  focuses  on  the  decrease  in  the  variance  as  a 
function  of  increase  in  sampling  rate.  The  problem  stated  mathematically  follows.  Find  Var(V" ) 
where 


N  1 

Y  -  Z  —  Y, , 

,-i  N  1 

V'-Y'+V',  V,-N(Q,a2), 

Cor(  V,,  Vj)  =  e’*1*'*1  . 


(2) 
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Let  the  total  amount  of  time  available  be  T,  and  assume  that  equally  spaced  observations  will 
be  taken.  The  correlation  between  successive  observations,  a,  is  defined  by  the  following 
equation 


-pr 

a  =  e77^  (3) 

Correlation  implies  that  the  same  thing  is  being  measured  on  separate  occasions,  and  thus 
reduces  the  potential  information  in  a  sample. 

For  uncorrelated  observations,  the  inner  product  associated  with  the  measurements  is  lN a 2. 

For  simplicity,  a  will  be  assumed  to  be  one  for  the  rest  of  the  discussion.  Let  X  be  an  N 

— ► 

dimensional  vector  of  ones  and  Y  be  the  vector  of  observations;  then  the  estimator  of  the 
average  of  Y{  is 


Y=1 

N 


X*  Y 


and 


Var{Y)  =  lx*'  lNX  1=1. 

Af  N  N  N 


(4) 


When  correlation  across  the  observations  exists,  the  observations  are  not  independent.  In 
this  situation  the  covariance  matrix  of  the  observations  is  found  by  using  Equation  3.  Since  a  is 
the  correlation  between  observations  one  time  step  from  each  other,  the  correlation  between  Vj 
and  V-  is  a1'-'1.  The  correlation  matrix,  I,  associated  with  the  set  of  measurements  is 
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1 

a 

a2 

otiN-n 

a 

1 

a 

a(N-2) 

a2 

a 

1 

a(N-3) 

a3 

a2 

a 

cF* 

a<N"» 

a(N-2) 

a<N'3>  ... 

1 

As  a  approaches  1 ,  the  correlation  matrix  ceases  to  be  positive  definite.  One  way  to  see  this 
idea  is  to  consider  sweeping  the  matrix  on  the  (1 ,1)  element.  The  sweep  operator  can  be  thought 
of  as  a  variation  of  the  Gram-Schmidt  process.  When  the  matrix  is  swept  on  the  (1,1)  element, 
the  first  row  becomes  orthogonal  to  the  space  spanned  by  the  remaining  altered  vectors.  In 
terms  of  inner  products,  the  projection  of  the  first  observation  onto  another  observation  is 
removed  from  each  observation  in  turn.  This  operation  removes  all  the  information  contained  in 
the  first  observation  from  the  others.  If  each  row  contains  values  close  to  1 ,  the  result  of  this  will 
be  that  there  is  very  little  information  left  after  the  first  row  is  processed.  For  a  discussion  of  the 
sweep  operator  and  its  implementation  see  Dempster  (1969)  or  Seber  (1977). 


— —  1  ^  A  .  ,  ^ 

Var{Y ),  with  I  as  the  covariance  of  the  observations,  is  —  X’EX  _  .  Since  X  is  a 

N  N 

column  of  ones,  this  operation  adds  the  values  of  each  column  and  then  adds  the  columns  and 
divides  the  result  by  A/2.  In  this  case,  to  find  the  variance  of  the  estimate,  the  elements  of  the 
matrix  are  added  and  then  that  sum  is  divided  by  the  square  of  the  number  of  observations. 
Examination  of  the  matrix  shows  there  is  one  diagonal  of  ones  of  length  N,  there  are  two 
diagonals  of  length  AM  filled  with  a,  two  diagonals  of  length  N-2  filled  with  a2  and  so  on. 
Therefore, 


Var{  Y) 


1 

A/2 


N- 1  \ 

N  *  2  E  (N-i)a.1  . 
"1  ) 


(5) 
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If  a  is  0  then  the  formula  does  indeed  reduce  to  the  previous  case  of  no  correlation;  and  if  a  is 
1  there  is  no  reduction  in  the  variance  of  the  estimate  by  taking  more  data,  if  a=1  then 


N- 1 


N- 1 


N  +  2  E  (A/-/)  =  N  +  2  E  / 

/•i  (.i 


-  A/2  (6) 

and  using  Equation  5  the  Var(Y~)  =  1.  The  value  of  a  depends  on  p,  the  time  period,  T.  and  the 
number  of  observations,  N.  Rewriting  Equation  5  to  reflect  this  dependence  yields, 


Var(  Y)  =  _L 
N 2 


am  -pri  ^ 

N  +  2  E  {N-i)eTrrT] 


(7) 


as  the  formula  of  interest. 

Using  Equation  7,  the  Var(T)  can  be  calculated  from  known  values  of  T,  p,  and  N.  In  this 
case,  it  is  possible  to  reduce  the  number  of  variables  by  expressing  Tin  terms  of  p.  When  p  is 
small,  the  correlation  will  fall  off  slowly  over  time.  In  discussing  different  processes,  the 
correlation  times  of  the  processes  are  typically  compared.  The  correlation  time  of  the  process 
is  defined  as  .1  .  For  the  remainder  of  this  report  the  signal  length  or  sampling  window  will  be 
in  correlation  time  units.  Using  these  time  units.  Equation  7  can  be  calculated  from  two 
variables— the  number  of  correlation  time  units  and  the  nurr.oer  of  observations.  In  evaluating 
this  formula,  the  result  will  indicate  the  reduction  of  uncertainty  for  a  sarr.  '*ng  window  length,  in 
correlation  time  units,  and  a  given  number  of  observations  within  the  sampling  window.  The  next 
task  is  to  evaluate  this  formula  at  some  interesting  points  and  make  some  observations  about  the 
behavior  of  the  Var(Y~)  as  T and  N  change.  Table  1  shows  Var(Y~)  evaluated  at  the  indicated 
values  of  N  and  T. 

Table  1  shows  that  there  is  an  optimal  sampling  rate  for  the  calculation  of  the  mean.  The 
increase  in  the  variance  as  a  result  of  oversampling  seems  to  approach  its  maximum  value  when 
the  sampling  rate  is  512  observations  per  correlation  time.  This  can  be  seen  in  row  1  of  the 
table,  when  the  signal  duration  is  one-eighth  of  the  correlation  time,  an  increase  in  the  number 
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Table  1 .  Var(Y  )  at  Values  of  T and  N. 


Sampling  Interval 

Number  of  Observations  (N) 

1/p  Time  Units  (T) 

2 

4 

8 

16 

32 

64 

128 

256 

.125 

.941 

.950 

.954 

.957 

.958 

.959 

.959 

.959 

.25 

.889 

.904 

.913 

.917 

.919 

.920 

.921 

.921 

.5 

.803 

.822 

.837 

.844 

.848 

.850 

.851 

.852 

.75 

.736 

.753 

.770 

.780 

.785 

.788 

.789 

.790 

1.0 

.684 

.693 

.712 

.723 

.729 

.733 

.734 

.735 

1.5 

.611 

.597 

.615 

.628 

.635 

.639 

.641 

.642 

2.0 

.568 

.525 

.539 

.552 

.560 

.563 

.566 

.567 

3.0 

.525 

.428 

.430 

.440 

.447 

.451 

.453 

.454 

4.0 

.509 

.369 

.357 

.364 

.370 

.373 

.375 

.376 

5.0 

.503 

.331 

.306 

.309 

.314 

.317 

.319 

.320 

of  samples  from  64  does  not  increase  the  variance  of  the  mean.  Equation  7  was  used  to  find  the 
value  of  N  that  corresponded  to  the  minimum  variance  for  sampling  intervals  of  1  to  1 0  correlation 
units;  these  are  displayed  in  Table  2  along  with  the  variance  obtained  when  the  number  of 
observations  is  twice  the  sampling  interval.  Examination  of  these  values  indicate  that  a  good  rule 
for  selecting  the  optimal  number  of  observations  is:  pick  two  if  the  signal  has  a  length  of  less  than 
one  correlation  time  unit;  otherwise  set  the  number  of  samples  to  twice  the  number  of  correlation 
time  units  in  the  sampling  window. 

Oversampling,  when  using  the  mean  as  an  estimator,  inflates  the  variance  of  the  estimate  by 
up  to  7%  of  its  minimum.  The  consequences  of  setting  the  sampling  rate  too  low  are  much  worse 
than  those  associated  with  a  high  sampling  rate.  For  correlated  data,  the  mean  is  not  the  best 
estimate  except  for  the  case  of  two  observations.  Thompson  (1991)  discusses  a  method  to  find 
the  best  weighing  factor  to  associate  with  correlated  observations.  Using  the  optimal  weights,  the 
variance  will  decrease  as  a  function  of  N,  unlike  the  equal  weight  case  presented  in  Table  1  and 
Table  2. 

4.  OPTIMAL  WEIGHTS 

Optimal  weights  are  those  that  yield  a  minimum  variance  unbiased  linear  estimate.  The 
weight  for  an  observation  is  inversely  proportional  to  the  variance  associated  with  an  observation 
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Table  2.  Minimum  Variance  of  Equation  7 


Time 

Minimum  N 

Variance 

N  =  2*Time 

Variance 

1 

2 

.6839 

2 

.6839 

2 

4 

.5253 

4 

.5253 

3 

5 

.4256 

6 

.4264 

4 

7 

.3567 

8 

.3572 

5 

9 

.3061 

10 

.3063 

6 

12 

.2676 

12 

.2676 

7 

14 

.2373 

14 

.2373 

8 

17 

.2130 

16 

.2131 

9 

20 

.1932 

18 

.1932 

10 

24 

.1766 

20 

.1767 

if  the  observations  are  independent;  thus,  if  the  variances  associated  with  each  observation  are 
the  same,  the  weights  will  be  equal  and  the  mean  will  be  the  optimal  estimator.  Each  weight 
indicates  the  relative  value  of  each  observation.  A  more  formal  statement  of  this  is:  the  optimal 
weights  define  the  inner  product  that  minimizes  the  error  when  the  observations  are  projected 
onto  a  set  of  independent  variables.  Each  weight  is  the  Lagrange  multiplier  associated  with  the 
observation. 

The  following  discussion  assumes  a  fixed  sampling  window  with  the  first  two  observations 
taken  at  the  extremes  of  the  interval.  The  intent  is  to  demonstrate  that  for  highly  correlated 
observations  an  increase  in  the  sampling  rate  may  not  result  in  a  meaningful  decrease  of  the 
variance;  thus  taking  additional  observations  may  not  increase  the  useful  information  in  a  sample. 
Since  all  pairwise  correlations  must  be  considered,  it  is  parsimonious  to  consider  the  change  from 
two  to  three  observations.  The  effects  of  taking  an  additional  sample  will  be  further  diminished 
if  the  sample  is  also  correlated  with  more  distant  neighbors;  however,  the  ideas  for  analysis  are 
the  same  as  those  in  the  change  from  two  to  three  observations. 

When  using  the  optimal  estimator,  the  addition  of  a  data  point  always  reduces  the  variance 
of  the  estimator.  To  illustrate  this,  the  change  in  the  variance  of  the  optimal  estimator  will  be 
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investigated  when  N  goes  from  2  to  3.  A  decrease  indicates  that  more  information  can  be 
extracted  from  any  subinterval  given  the  optimal  weights.  Assuming  a  correlation  of  a2  between 
two  observations,  each  has  the  same  weight  (.5).  The  variance  is  given  by  Thompson  (1991)  as 


Var(  Y)  = 


1  +  a2 
2 


(8) 


Next,  assume  that  an  additional  observation  had  been  taken  at  the  midpoint  of  the  interval.  The 
formulas  for  finding  the  three  optimal  weights  are  given  by  Thompson.  Using  the  formula  for 
three  correlated  observations,  let  =  c2  =  o3  =  1,  p12  =  p23  =  a,  and  p13  =  a2. 


k,  +  k^  +  ^  =  1 


2(1 -a2) 

1  -a2 

1  -a2 

1  -a2 

2(1 -a). 

1  -a 

Solving  Equation  9  leads  to  the  following  values, 


K 


1 

3-oc 


K  = 


1  -a 
3-a 


*»- 


1 

3-a 


(9) 


(10) 


The  value  of  each  observation  is  proportional  to  the  weight  associated  with  it.  The  added 
observation,  k^  will  have  a  small  value  when  a  is  close  to  1 .  The  following  formula  gives  the 
variance  of  the  estimator. 


Var{ 9)  ■  k,2  +  k£  +  fcf  +  2k,Aja  ♦  2/tj^a  +  2k, ^a2  .  (11) 
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Using  this  formula  with  the  values  of  the  weights  plugged  in  gives 


Var(  Y)  -  2  ~a)2  +  -a)a  +  2a2 

(3-a)2 


1  +a 

3^a  ‘  (12) 


As  a  approaches  1,  Var(Y~)  goes  to  1.  The  addition  of  an  observation  that  is  highly  correlated 
with  its  neighbors  seems  pointless  as  it  will  have  very  little  influence  on  the  estimate  and  will  not 
significantly  increase  the  information  extracted  from  the  signal.  The  exact  amount  of  the  decrease 
in  the  variance  can  be  found  by  subtracting  the  right  side  of  Equation  1 2  from  the  right  side  of 
Equation  8.  This  is  done  in  the  following: 


1  +<x2  _  1  +<x  _  (1  +a2)(3  -  a)  -  2(1  +a) 
2  3-a'  2(3-a) 


.  (1-a)3 

2(3 -a)  '  (13) 


Since  both  the  numerator  and  denominator  are  positive  for  0  <  a  <  1 ,  the  information  gain  is 
positive.  If  a  is  0.9,  the  gain  due  to  the  extra  observation  is  .000238;  for  an  a  of  0.5  the  gain 
would  be  0.025.  Correlated  errors  drastically  reduce  the  information  contained  in  the  set  of 
observations. 

Finding  the  optimal  weights  for  a  set  of  observations  involves  finding  the  inverse  of  a  matrix 
that  is  ill-conditioned  if  the  correlation  is  high.  In  many  cases,  it  may  be  numerically  impossible 
to  perform  this  operation.  Optimal  weights  can  be  found  using  the  method  discussed  in  Case  7 
of  Thompson  (1991).  The  only  reasonable  way  to  calculate  the  weights  is  using  software  for 
matrix  operations.  A  program  was  devised  to  evaluate  the  variance  when  optimal  weights  are 
used  for  estimation;  the  length  of  the  sampling  window  and  number  of  observations  were  varied. 
The  results  are  shown  in  Table  3. 
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Table  3.  Variance  of  Optimal  Estimator  at  Values  of  T  and  N 


Sampling  Interval 

Number  of  Observations  (N) 

1/p  Time  Units  (T) 

2 

3 

4 

5 

10 

0.5 

.8033 

.8008 

.8004 

.8002 

.8000 

1 

.6839 

.6712 

.6687 

.6678 

.6669 

2 

.5677 

.5197 

.5090 

.5051 

.5010 

3 

.5249 

.4405 

.4191 

.4109 

.4022 

4 

.5092 

.3963 

.3639 

.3511 

.3370 

5 

.5034 

.3708 

.3282 

.3107 

.2909 

10 

.5000 

.3363 

.2636 

.2276 

.1804 

First  consider  each  column.  The  reciprocal  of  the  number  of  observations  is  the  lower  bound 
for  each  column;  it  is  the  variance  that  would  be  obtained  if  the  observations  were  independent. 
Moving  down  a  column  shows  the  effects  of  increasing  the  duration  of  the  sampling  window. 
Moving  across  a  row  shows  the  effects  of  adding  more  observations  to  a  fixed  sampling  interval. 
To  complement  the  values  in  the  table,  two  additional  cases  were  evaluated:  a  sample  window 
of  10  correlation  units  with  20  observations  results  in  a  variance  of  0.1698;  and,  for  the  same 
window,  30  observations  result  in  a  variance  of  0.1680.  As  the  number  of  observations  are 
increased,  the  variance  seems  to  decrease  asymptotically.  As  a  general  rule,  the  sampling  rate 
should  be  set  to  two  observations  per  correlation  time. 

In  actual  situations,  the  size  of  the  observation  window  in  correlation  time  units  and  the 
intensity  of  noise  correlation  may  not  be  precisely  known,  or  either  may  vary  over  time.  Typically, 
it  is  impossible  to  precisely  calculate  the  optimal  weights  beforehand.  The  amount  of  information 
available  for  processing  is  directly  proportional  to  the  number  of  correlation  time  units  over  which 
the  observations  are  made;  rather  than  the  number  of  samples. 

5.  CONCLUSIONS 


When  the  interval  to  collect  data  is  small,  the  possibility  of  correlated  data  should  be 
addressed.  If  the  observations  are  correlated,  the  length  of  the  sampling  window  may  need  to 
be  increased.  For  an  active  protection  system  sensor  this  indicates  that  extending  the  initial 
detection  range  has  the  greatest  potential  to  increase  the  accuracy  of  the  system  if  the 
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observations  are  correlated.  The  prudent  approach  would  be  to  gather  as  much  information  as 
possible  on  the  noise  processes  that  will  degrade  a  sensor  system’s  performance  and  then  decide 
how  many  correlation  time  units  are  needed  for  adequate  performance.  This  demands  knowledge 
of  the  energy  transformations  associated  with  a  specific  sensor,  and  the  effects  of  the  atmosphere 
or  propagation  media  on  the  signal.  The  amount  of  correlation  and  intensity  of  the  sensor  noise 
process  will  limit  the  information  that  can  be  extracted  over  a  short  time  period. 

If  the  formulas  for  uncorrelated  observations  are  used,  when  the  observations  are  correlated, 
the  variance  estimate  will  be  too  low.  In  effect,  there  are  fewer  independent  observations. 
Simulation  can  be  used  to  assess  the  effects  of  specific  noise  processes.  When  using  least 
squares  estimation,  the  covariance  estimate  of  the  parameters  will  be  deflated  if  the  sampling  rate 
introduces  correlation.  When  the  observations  are  correlated,  an  extension  of  the  observation 
window  is  the  most  effective  way  to  increase  performance. 
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